GraphZip: Mining Graph Streams using Dictionary-based Compression

نویسندگان

  • Charles Packer
  • Lawrence Holder
چکیده

A massive amount of data generated today on platforms such as social networks, telecommunication networks, and the internet in general can be represented as graph streams. Activity in a network’s underlying graph generates a sequence of edges in the form of a stream; for example, a social network may generate a graph stream based on the interactions (edges) between di‚erent users (nodes) over time. While many graph mining algorithms have already been developed for analyzing relatively small graphs, graphs that begin to approach the size of real-world networks stress the limitations of such methods due to their dynamic nature and the substantial number of nodes and connections involved. In this paper we present GraphZip, a scalable method for mining interesting paŠerns in graph streams. GraphZip is inspired by the Lempel-Ziv (LZ) class of compression algorithms, and uses a novel dictionary-based compression approach to discover maximallycompressing paŠerns in a graph stream. We experimentally show that GraphZip is able to retrieve complex and insightful paŠerns from large real-world graphs and arti€cially-generated graphs with ground truth paŠerns. Additionally, our results demonstrate that GraphZip is both highly ecient and highly e‚ective compared to existing state-of-the-art methods for mining graph streams.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

GraphZip: Dictionary-based Compression for Mining Graph Streams

A massive amount of data generated today on platforms such as social networks, telecommunication networks, and the internet in general can be represented as graph streams. Activity in a network’s underlying graph generates a sequence of edges in the form of a stream; for example, a social network may generate a graph stream based on the interactions (edges) between di‚erent users (nodes) over t...

متن کامل

A framework for clustering massive graph streams

In this paper, we examine the problem of clustering massive graph streams. Graph clustering poses significant challenges because of the complex structures which may be present in the underlying data. The massive size of the underlying graph makes explicit structural enumeration very difficult. Consequently, most techniques for clustering multidimensional data are difficult to generalize to the ...

متن کامل

Eecient Optimal Recompression

An eecient variant of an optimal algorithm is presented, which reorganizes data that has been compressed by some on-they compression method, into a more compact form, without changing the decoding procedure. The algorithm accelerates and improves the space requirements of a known technique based on a reduction to a graph-theoretic problem, by reducing the size of the graph, without aaecting the...

متن کامل

Frequent Pattern Mining from Dense Graph Streams

As technology advances, streams of data can be produced in many applications such as social networks, sensor networks, bioinformatics, and chemical informatics. These kinds of streaming data share a property in common—namely, they can be modeled in terms of graph-structured data. Here, the data streams generated by graph data sources in these applications are graph streams. To extract implicit,...

متن کامل

Ef"cient Optimal Recompression

An ef"cient variant of an optimal algorithm is presented, which reorganizes data that has been compressed by some on-the-#y compression method, into a more compact form, without changing the decoding procedure. The algorithm accelerates and improves the space requirements of a known technique based on a reduction to a graph-theoretic problem, by reducing the size of the graph, without affecting...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017